import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib_scalebar.scalebar import ScaleBar
import osmnx as ox
# Set display option to show all columns
pd.set_option('display.max_columns', None)
Lab 1: Geographic Data and Spatial Models¶
Welcome to today's lab session!
In this session, we will delve into the functionalities of two powerful Python libraries: Geopandas and GurobiPy.
- Geopandas:
Geopandas is a Python library that facilitates working with geospatial data. It extends the capabilities of Pandas, a widely used data manipulation library, to support geographic data types and operations. Throughout this lab, we will learn how to read, write, visualize, and analyze geographic data using Geopandas.
- GurobiPy:
GurobiPy, or Gurobi Optimization Python Interface, is a Python package that provides access to the Gurobi Optimizer, a powerful mathematical optimization solver. Gurobi is widely used in various industries for solving linear programming, integer programming, quadratic programming, and other optimization problems. In this lab, we will explore the basics of using GurobiPy to formulate and solve optimization problems.
- Knapsack Problem:
As part of our exploration, we will delve into a classic optimization problem known as the Knapsack Problem. This problem involves selecting a combination of items to maximize value while staying within a given weight constraint. We will apply the concepts learned from Geopandas and GurobiPy to solve practical scenarios, such as selecting parcels to buy under various conditions, resembling real-world decision-making scenarios.
Throughout the lab, we encourage you to actively engage in hands-on exercises, experiment with code snippets, and explore additional functionalities of Geopandas and GurobiPy. Feel free to ask questions, collaborate with your peers, and enjoy the journey of discovering the capabilities of these powerful Python libraries.
Lab Structure¶
Lab 1 is due in two weeks and consists of two chapters.
Chapter 1 is to figure out which parcel you'd like to buy within given budget.
Chpater 2 is to figure out the distribution of the distance between wildfire ignitions and major roads.
Chapter 1. Which parcel do you want to buy?¶
Chapter 1 performs an exploratory data analysis on Parcel Data through Geopandas.
Then, utilizing given information, a Knapsack problem will be solved using GurobiPy solver.
You'll get to know which parcels you'd like to buy with given budget.
Step 1. Explore Santa Barbara Parcel Data¶
# Read parcel data in
parcels = gpd.read_file("Lab1Data/parcel_SB.shp")
# Plot your parcel data
parcels.plot()
<Axes: >
parcels
| OBJECTID | APN | LAYER | Situs1 | Situs2 | Acreage | LandUse | UseCode | PctTransf | LandValue | StrImpr | LivImpr | NetSecVal | Net_Impr | YearBuilt | Bedrooms | Bathrooms | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 131101 | 041-350-024 | Ground | 2315 EDGEWATER WAY | SANTA BARBARA, CA 93109 | 0.52 | SINGLE FAMILY RESIDENCE | 0100 | 100.0 | 288614 | 331840 | 0 | 620454 | 331840 | 1967 | 4 | 3.00 | POLYGON ((6039997.884 1971525.657, 6039982.106... |
| 1 | 131102 | 041-350-012 | Ground | 2307 EDGEWATER WAY | SANTA BARBARA, CA 93109 | 0.18 | SINGLE FAMILY RESIDENCE | 0100 | 26.6 | 550902 | 1085141 | 0 | 1636043 | 1085141 | 2006 | 3 | 3.50 | POLYGON ((6040044.305 1971477.086, 6040032.277... |
| 2 | 131103 | 041-350-015 | Ground | 2211 EDGEWATER WAY | SANTA BARBARA, CA 93109 | 0.59 | SINGLE FAMILY RESIDENCE | 0100 | 100.0 | 1641000 | 175000 | 0 | 1816000 | 175000 | 1954 | 2 | 2.50 | POLYGON ((6040237.820 1971288.705, 6040179.849... |
| 3 | 131104 | 041-350-016 | Ground | 2201 EDGEWATER WAY | SANTA BARBARA, CA 93109 | 0.50 | SINGLE FAMILY RESIDENCE | 0100 | 100.0 | 1644611 | 1482404 | 0 | 3127015 | 1482404 | 1948 | 3 | 3.00 | POLYGON ((6040320.008 1971271.353, 6040268.735... |
| 4 | 131105 | 041-350-017 | Ground | 2111 EDGEWATER WAY | SANTA BARBARA, CA 93109 | 0.76 | SINGLE FAMILY RESIDENCE | 0100 | 0.0 | 63587 | 156682 | 0 | 213269 | 156682 | 1950 | 4 | 3.00 | POLYGON ((6040450.139 1971243.880, 6040398.978... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1867 | 250105 | 045-290-008 | Condo Third Floor | 30 BARRANCA AVE 3 | SANTA BARBARA, CA 93109 | 0.00 | CONDOS,COMMUNITY APT PROJS | 0323 | 100.0 | 46816 | 70592 | 0 | 117408 | 70592 | 1973 | 2 | 1.75 | POLYGON ((6048328.471 1972754.300, 6048325.112... |
| 1868 | 250106 | 045-290-009 | Condo Third Floor | 36 BARRANCA AVE 6 | SANTA BARBARA, CA 93109 | 0.00 | CONDOS,COMMUNITY APT PROJS | 0323 | 100.0 | 750000 | 668000 | 0 | 1418000 | 668000 | 1973 | 2 | 1.75 | POLYGON ((6048287.470 1972741.338, 6048291.580... |
| 1869 | 250107 | 045-290-010 | Condo Third Floor | 36 BARRANCA AVE 5 | SANTA BARBARA, CA 93109 | 0.00 | CONDOS,COMMUNITY APT PROJS | 0323 | 100.0 | 309553 | 206366 | 0 | 515919 | 206366 | 1973 | 2 | 2.00 | POLYGON ((6048283.551 1972700.429, 6048271.984... |
| 1870 | 250108 | 045-290-011 | Condo Third Floor | 40 BARRANCA AVE 3 | SANTA BARBARA, CA 93109 | 0.00 | CONDOS,COMMUNITY APT PROJS | 0323 | 100.0 | 124884 | 187149 | 0 | 312033 | 187149 | 1973 | 2 | 1.75 | POLYGON ((6048234.146 1972677.397, 6048238.386... |
| 1871 | 250109 | 045-330-008 | Condo Third Floor | 101 OCEANO AVE 5 | SANTA BARBARA, CA 93109 | 0.00 | CONDOS,COMMUNITY APT PROJS | 0332 | 100.0 | 305105 | 298615 | 0 | 603720 | 298615 | 1961 | 2 | 1.75 | POLYGON ((6048280.057 1972978.876, 6048296.690... |
1872 rows × 18 columns
# Check the data types of your parcel data
parcels.dtypes
OBJECTID int64 APN object LAYER object Situs1 object Situs2 object Acreage float64 LandUse object UseCode object PctTransf float64 LandValue int64 StrImpr int64 LivImpr int64 NetSecVal int64 Net_Impr int64 YearBuilt object Bedrooms int64 Bathrooms float64 geometry geometry dtype: object
# Check the Coordiante System of your parcel data
parcels.crs
<Projected CRS: EPSG:2229> Name: NAD83 / California zone 5 (ftUS) Axis Info [cartesian]: - X[east]: Easting (US survey foot) - Y[north]: Northing (US survey foot) Area of Use: - name: United States (USA) - California - counties Kern; Los Angeles; San Bernardino; San Luis Obispo; Santa Barbara; Ventura. - bounds: (-121.42, 32.76, -114.12, 35.81) Coordinate Operation: - name: SPCS83 California zone 5 (US Survey feet) - method: Lambert Conic Conformal (2SP) Datum: North American Datum 1983 - Ellipsoid: GRS 1980 - Prime Meridian: Greenwich
# Plot your parcel data with the column "LandValue"
ax = parcels.plot("LandValue", legend=True, figsize=(15,5))
# Add a scale bar
plt.title("Parcel LandValue")
scalebar = ScaleBar(1, scale_formatter = lambda value, unit: f"{value}000 ft")
ax.add_artist(scalebar)
<matplotlib_scalebar.scalebar.ScaleBar at 0x143f89b10>
# Plot with a column "NetSecVal", Net Secured Value for each parcel.
ax = parcels.plot("NetSecVal", legend=True, figsize=(15,5))
# Add a scale bar
plt.title("Parcel Net Secured Value")
scalebar = ScaleBar(1, scale_formatter = lambda value, unit: f"{value}000 ft")
ax.add_artist(scalebar)
<matplotlib_scalebar.scalebar.ScaleBar at 0x143f4c850>
# Visualize another column, "YearBuilt" and check out a problem
ax = parcels.plot("YearBuilt", legend=True, figsize=(15,5))
# Add a scale bar
plt.title("Parcel YearBuilt")
scalebar = ScaleBar(1, scale_formatter = lambda value, unit: f"{value}000 ft")
ax.add_artist(scalebar)
<matplotlib_scalebar.scalebar.ScaleBar at 0x14400c590>
# Yes, there are two problems.
# 1. Legend is too long.
# 2. Some parcels are disappeared.
# Therefore, let's see what's happening in the YearBuilt column.
# Firstly, let's print out the raw values.
parcels["YearBuilt"]
0 1967
1 2006
2 1954
3 1948
4 1950
...
1867 1973
1868 1973
1869 1973
1870 1973
1871 1961
Name: YearBuilt, Length: 1872, dtype: object
# I see. They are in string type, not in numeric type.
# That's why the legend was categorical, not a continuous colorbar.
# So we are going to replace the YearBuilt columns, with the same values in float data type.
# You can convert the data type with the method, .astype()
parcels["YearBuilt"]=parcels["YearBuilt"].astype("float")
parcels["YearBuilt"]
0 1967.0
1 2006.0
2 1954.0
3 1948.0
4 1950.0
...
1867 1973.0
1868 1973.0
1869 1973.0
1870 1973.0
1871 1961.0
Name: YearBuilt, Length: 1872, dtype: float64
# Also, as there were some missing parcels, let's check if there's any Null values in YearBuilt column.
parcels["YearBuilt"].isnull().sum()
135
# Oh.. there were 135 null values, that's why there are some blanks. 135 parcels out of 1872 are true.
# This is a data limitation or there must be some reason of the null values.
# So let's accept the fact that there exist null values.
# Now, plot the parcels GeoDataFrame with the "YearBuilt" column.
# Is there still something non-plausible..?
ax = parcels.plot("YearBuilt", legend=True, figsize=(15,5))
# Add a scale bar
plt.title("Parcel YearBuilt")
scalebar = ScaleBar(1, scale_formatter = lambda value, unit: f"{value}000 ft")
ax.add_artist(scalebar)
<matplotlib_scalebar.scalebar.ScaleBar at 0x143ed0110>
# How can there be some buildings built so long ago..? Year 0..? 2024 years ago?!
# Let's check out the min and max values of the YearBuilt column values.
parcels["YearBuilt"].min(), parcels["YearBuilt"].max()
(0.0, 2015.0)
# Certainly, it must mean that the parcels with the year value 0 are actually Null values.
# So let's replace them with Null, so the plot will show better spatial variation.
parcels.loc[parcels["YearBuilt"]==0, "YearBuilt"] = None
parcels.loc[parcels["YearBuilt"]==0]
| OBJECTID | APN | LAYER | Situs1 | Situs2 | Acreage | LandUse | UseCode | PctTransf | LandValue | StrImpr | LivImpr | NetSecVal | Net_Impr | YearBuilt | Bedrooms | Bathrooms | geometry |
|---|
### Your Turn!! Let's visualize the YearBuilt column with ScaleBar, once again. Does it look plausible?
ax = parcels.plot("YearBuilt", legend=True, figsize=(15,5))
# Add a scale bar
plt.title("Parcel YearBuilt")
scalebar = ScaleBar(1, scale_formatter = lambda value, unit: f"{value}000 ft")
ax.add_artist(scalebar)
<matplotlib_scalebar.scalebar.ScaleBar at 0x143faaa50>
Step 2. Subset the Residential Parcels for Optimization¶
# Let's check what the dataset has in "LandUse" column using the .value_counts() method.
# It returns the unique values and frequency.
parcels["LandUse"].value_counts()
LandUse SINGLE FAMILY RESIDENCE 1560 CONDOS,COMMUNITY APT PROJS 182 RESIDENTIAL INCOME, 2-4 UNITS 42 APARTMENTS, 5 OR MORE UNITS 29 VACANT 25 PARKS 5 OFFICE BUILDINGS, MULTI-STORY 5 PARKING LOTS 3 SUPERMARKETS 2 RESTAURANTS,BARS 1 SERVICE STATIONS 1 RETAIL STORES, SINGLE STORY 1 CHURCHES, RECTORY 1 STORE AND OFFICE COMBINATION 1 MIXED USE-COMMERCIAL/RESIDENTIAL 1 PROFESSIONAL BUILDINGS 1 OFFICE BUILDINGS, SINGLE STORY 1 COMMERCIAL AND OFFICE CONDOS,PUDS 1 Name: count, dtype: int64
parcels
| OBJECTID | APN | LAYER | Situs1 | Situs2 | Acreage | LandUse | UseCode | PctTransf | LandValue | StrImpr | LivImpr | NetSecVal | Net_Impr | YearBuilt | Bedrooms | Bathrooms | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 130716 | 075-010-022 | Ground | 6875 EL COLEGIO RD | GOLETA, CA 93117 | 9.60 | SCHOOLS | 7200 | 0.0 | 0 | 0 | 0 | 0 | 0 | None | 0 | 0.0 | POLYGON ((5997851.219 1979476.854, 5998251.155... |
| 1 | 130717 | 075-010-039 | Ground | None | None | 0.83 | VACANT | 0090 | 100.0 | 0 | 0 | 0 | 0 | 0 | None | 0 | 0.0 | POLYGON ((5998411.116 1979466.012, 5998407.771... |
| 2 | 130718 | 075-010-040 | Ground | None | None | 0.35 | VACANT | 0090 | 100.0 | 0 | 0 | 0 | 0 | 0 | None | 0 | 0.0 | POLYGON ((5998631.052 1979460.936, 5998407.771... |
| 3 | 130719 | 075-010-037 | Ground | None | None | 21.60 | VACANT | 0090 | 100.0 | 0 | 0 | 0 | 0 | 0 | None | 0 | 0.0 | POLYGON ((5998749.878 1979458.194, 5998727.559... |
| 4 | 130720 | 075-010-028 | Ground | 6795 EL COLEGIO RD | GOLETA, CA 93117 | 11.78 | VACANT | 0090 | 0.0 | 0 | 0 | 0 | 0 | 0 | None | 0 | 0.0 | POLYGON ((5999395.829 1979443.282, 5999388.511... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 912 | 244502 | 075-230-001 | Condo First Floor | 6636 DEL PLAYA DR # 1 | GOLETA, CA 93117 | 0.03 | CONDOS,COMMUNITY APT PROJS | 0300 | 100.0 | 308718 | 308718 | 0 | 617436 | 308718 | 2003 | 1 | 1.0 | POLYGON ((6000659.415 1976779.651, 6000658.790... |
| 913 | 244503 | 075-230-003 | Condo First Floor | 6636 DEL PLAYA DR # 3 | GOLETA, CA 93117 | 0.03 | CONDOS,COMMUNITY APT PROJS | 0300 | 100.0 | 308718 | 308718 | 0 | 617436 | 308718 | 2003 | 1 | 1.0 | POLYGON ((6000659.596 1976790.249, 6000659.432... |
| 914 | 248854 | 075-230-002 | Condo Second Floor | 6636 DEL PLAYA DR # 2 | GOLETA, CA 93117 | 0.03 | CONDOS,COMMUNITY APT PROJS | 0300 | 100.0 | 308718 | 308718 | 0 | 617436 | 308718 | 2003 | 1 | 2.0 | POLYGON ((6000646.610 1976774.894, 6000646.553... |
| 915 | 248855 | 075-230-004 | Condo Second Floor | 6636 DEL PLAYA DR # 4 | GOLETA, CA 93117 | 0.03 | CONDOS,COMMUNITY APT PROJS | 0300 | 100.0 | 308718 | 308718 | 0 | 617436 | 308718 | 2003 | 1 | 2.0 | POLYGON ((6000659.898 1976810.348, 6000659.596... |
| 916 | 239415 | 075-020-039 | Right of Way (Private) | None | None | 0.52 | HIGHWAYS AND STREETS | 8800 | 100.0 | 154 | 0 | 0 | 154 | 0 | None | 0 | 0.0 | POLYGON ((6000356.910 1978984.122, 6000353.435... |
917 rows × 18 columns
# Let's take a subset of the original parcels data, sothat only residential parcels can be included.
res_parcels = parcels.loc[parcels["LandUse"].isin(["SINGLE FAMILY RESIDENCE",
"CONDOS,COMMUNITY APT PROJS",
"RESIDENTIAL INCOME, 2-4 UNITS",
"APARTMENTS, 5 OR MORE UNITS"])]
res_parcels["LandUse"].value_counts()
# In .loc[], the first argument selects rows based on a condition, and the optional second argument specifies columns.
# The .isin() function generates a binary series indicating whether each value is present in the given array.
LandUse SINGLE FAMILY RESIDENCE 1560 CONDOS,COMMUNITY APT PROJS 182 RESIDENTIAL INCOME, 2-4 UNITS 42 APARTMENTS, 5 OR MORE UNITS 29 Name: count, dtype: int64
# Also, let's say that we want the parcels where YearBuilt is identified.
# Check out how many parcels don't have YearBuilt identified.
res_parcels["YearBuilt"].isnull().sum()
77
res_parcels = res_parcels.reset_index(drop=True)
# Then, let's drop them. 77 parcels don't have YearBuilt values.
res_parcels.loc[res_parcels["YearBuilt"].isnull()].index
res_parcels = res_parcels.drop(res_parcels.loc[res_parcels["YearBuilt"].isnull()].index).reset_index(drop=True)
# Check the output GeoDataFrame, anything non-plausible?
len(res_parcels)
1736
# How would you fix this?
#There is one parcel with zero bedrooms. Want to create a column that shows whether parcel has zero bedrooms or not.
res_parcels["Bedroom_"] = res_parcels["Bedrooms"] > 0
res_parcels
| OBJECTID | APN | LAYER | Situs1 | Situs2 | Acreage | LandUse | UseCode | PctTransf | LandValue | StrImpr | LivImpr | NetSecVal | Net_Impr | YearBuilt | Bedrooms | Bathrooms | geometry | Bedroom_ | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 131101 | 041-350-024 | Ground | 2315 EDGEWATER WAY | SANTA BARBARA, CA 93109 | 0.52 | SINGLE FAMILY RESIDENCE | 0100 | 100.0 | 288614 | 331840 | 0 | 620454 | 331840 | 1967.0 | 4 | 3.00 | POLYGON ((6039997.884 1971525.657, 6039982.106... | True |
| 1 | 131102 | 041-350-012 | Ground | 2307 EDGEWATER WAY | SANTA BARBARA, CA 93109 | 0.18 | SINGLE FAMILY RESIDENCE | 0100 | 26.6 | 550902 | 1085141 | 0 | 1636043 | 1085141 | 2006.0 | 3 | 3.50 | POLYGON ((6040044.305 1971477.086, 6040032.277... | True |
| 2 | 131103 | 041-350-015 | Ground | 2211 EDGEWATER WAY | SANTA BARBARA, CA 93109 | 0.59 | SINGLE FAMILY RESIDENCE | 0100 | 100.0 | 1641000 | 175000 | 0 | 1816000 | 175000 | 1954.0 | 2 | 2.50 | POLYGON ((6040237.820 1971288.705, 6040179.849... | True |
| 3 | 131104 | 041-350-016 | Ground | 2201 EDGEWATER WAY | SANTA BARBARA, CA 93109 | 0.50 | SINGLE FAMILY RESIDENCE | 0100 | 100.0 | 1644611 | 1482404 | 0 | 3127015 | 1482404 | 1948.0 | 3 | 3.00 | POLYGON ((6040320.008 1971271.353, 6040268.735... | True |
| 4 | 131105 | 041-350-017 | Ground | 2111 EDGEWATER WAY | SANTA BARBARA, CA 93109 | 0.76 | SINGLE FAMILY RESIDENCE | 0100 | 0.0 | 63587 | 156682 | 0 | 213269 | 156682 | 1950.0 | 4 | 3.00 | POLYGON ((6040450.139 1971243.880, 6040398.978... | True |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1731 | 250105 | 045-290-008 | Condo Third Floor | 30 BARRANCA AVE 3 | SANTA BARBARA, CA 93109 | 0.00 | CONDOS,COMMUNITY APT PROJS | 0323 | 100.0 | 46816 | 70592 | 0 | 117408 | 70592 | 1973.0 | 2 | 1.75 | POLYGON ((6048328.471 1972754.300, 6048325.112... | True |
| 1732 | 250106 | 045-290-009 | Condo Third Floor | 36 BARRANCA AVE 6 | SANTA BARBARA, CA 93109 | 0.00 | CONDOS,COMMUNITY APT PROJS | 0323 | 100.0 | 750000 | 668000 | 0 | 1418000 | 668000 | 1973.0 | 2 | 1.75 | POLYGON ((6048287.470 1972741.338, 6048291.580... | True |
| 1733 | 250107 | 045-290-010 | Condo Third Floor | 36 BARRANCA AVE 5 | SANTA BARBARA, CA 93109 | 0.00 | CONDOS,COMMUNITY APT PROJS | 0323 | 100.0 | 309553 | 206366 | 0 | 515919 | 206366 | 1973.0 | 2 | 2.00 | POLYGON ((6048283.551 1972700.429, 6048271.984... | True |
| 1734 | 250108 | 045-290-011 | Condo Third Floor | 40 BARRANCA AVE 3 | SANTA BARBARA, CA 93109 | 0.00 | CONDOS,COMMUNITY APT PROJS | 0323 | 100.0 | 124884 | 187149 | 0 | 312033 | 187149 | 1973.0 | 2 | 1.75 | POLYGON ((6048234.146 1972677.397, 6048238.386... | True |
| 1735 | 250109 | 045-330-008 | Condo Third Floor | 101 OCEANO AVE 5 | SANTA BARBARA, CA 93109 | 0.00 | CONDOS,COMMUNITY APT PROJS | 0332 | 100.0 | 305105 | 298615 | 0 | 603720 | 298615 | 1961.0 | 2 | 1.75 | POLYGON ((6048280.057 1972978.876, 6048296.690... | True |
1736 rows × 19 columns
# Then, with the residential parcels having YearBuilt identified,
# let's check the spatial distribution of the number of Bedrooms. Plot with a column "Bedrooms"
res_parcels.plot("Bedroom_", legend = True)
<Axes: >
# Also, let's check the spatial distribution of the number of Bathrooms.
res_parcels.plot("Bathrooms", figsize=(15,5), legend=True, categorical=True, cmap="YlOrRd")
plt.axis("off")
plt.title("The number of Bathrooms")
Text(0.5, 1.0, 'The number of Bathrooms')
# Let's find the parcels, which doesn't have any Bedroom.
res_parcels.loc[res_parcels["Bedroom_"] == False]
| OBJECTID | APN | LAYER | Situs1 | Situs2 | Acreage | LandUse | UseCode | PctTransf | LandValue | StrImpr | LivImpr | NetSecVal | Net_Impr | YearBuilt | Bedrooms | Bathrooms | geometry | Bedroom_ | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1139 | 155252 | 045-185-006 | Ground | 1419 SHORELINE DR | SANTA BARBARA, CA 93109 | 0.32 | SINGLE FAMILY RESIDENCE | 0100 | 100.0 | 77851 | 128819 | 0 | 199670 | 128819 | 1946.0 | 0 | 2.0 | POLYGON ((6044967.574 1971082.045, 6044995.993... | False |
# Let's create a new column, "bedroom" in the GeoDataFrame. (column name is case-sensitive)
# The "bedroom" column will have a binary (True/False) value indicating if the parcel has greater than zero Bedrooms.
# Parcel 1139 has no bedroom.
# Now visualize the binary column, "bedroom".
Step 3. Optimization using GurobiPy: Knapsack Problem¶
Knapsack Problem Equations¶
Notations:
The objective is to maximize the total value of items selected while keeping the total weight within the knapsack capacity.
Let:
- ( $n$ ) be the number of items.
- ( $v_i$ ) be the value of the ( i )th item.
- ( $w_i$ ) be the weight of the ( i )th item.
- ( $W$ ) be the maximum capacity of the knapsack.
where:
- ( $x_i$ ) is a binary decision variable:
- ( $x_i$ = 1 ) if the ( i )th item is selected.
- ( $x_i$ = 0 ) otherwise.
Complete Formulation:
The complete formulation of the Knapsack problem is as follows:
Maximize:
$ \sum_{i=1}^{n} v_i \cdot x_i $
Subject to:
$ \sum_{i=1}^{n} w_i \cdot x_i \leq W $
$ x_i \in \{0, 1\}, \quad \forall i = 1, 2, \ldots, n $
list(range(len(res_parcels)))[-1]
1735
# Let's import the GurobiPy into this Notebook.
import gurobipy as gp
from gurobipy import GRB
# Define the data (weights, values, and knapsack capacity)
objective_weights = res_parcels['Bedrooms'] # Example weights of items
constraint_weights = res_parcels['LandValue'] # Example values of items
BUDGET = 1000000 # Knapsack capacity
#There is a better way(?) You want to maximize number of bedrooms with a budget of 100000 dollars
# Create a new Gurobi model
m = gp.Model("knapsack")
# Add decision variables for each item (binary variables indicating whether to include the item in the knapsack or not)
x = m.addVars(len(res_parcels), vtype=GRB.BINARY, name="x")
# Set objective function: maximize total value
m.setObjective(sum(x[i]*objective_weights[i] for i in range(len(res_parcels))), sense=GRB.MAXIMIZE)
# Add constraint: total weight of selected items must not exceed knapsack capacity (BUDGET)
# m.addConstr(sum(x[i]*constraint_weights[i] for i in range(len(res_parcels)))<= BUDGET)
# Optimize the model
m.optimize()
Restricted license - for non-production use only - expires 2025-11-24 Gurobi Optimizer version 11.0.1 build v11.0.1rc0 (mac64[arm] - Darwin 22.1.0 22A400) CPU model: Apple M1 Thread count: 8 physical cores, 8 logical processors, using up to 8 threads Optimize a model with 0 rows, 1736 columns and 0 nonzeros Model fingerprint: 0x39170ef2 Variable types: 0 continuous, 1736 integer (1736 binary) Coefficient statistics: Matrix range [0e+00, 0e+00] Objective range [1e+00, 2e+01] Bounds range [1e+00, 1e+00] RHS range [0e+00, 0e+00] Found heuristic solution: objective 5271.0000000 Explored 0 nodes (0 simplex iterations) in 0.00 seconds (0.00 work units) Thread count was 1 (of 8 available processors) Solution count 1: 5271 Optimal solution found (tolerance 1.00e-04) Best objective 5.271000000000e+03, best bound 5.271000000000e+03, gap 0.0000%
#NEW NAPSACK PROBLEM QUESTION #10
# Let's import the GurobiPy into this Notebook.
import gurobipy as gp
from gurobipy import GRB
# Define the data (weights, values, and knapsack capacity)
objective_weights = res_parcels['Bathrooms'] # Example weights of items
constraint_weights = res_parcels['LandValue'] # Example values of items
BUDGET = 1000000 # Knapsack capacity
#There is a better way(?) You want to maximize number of bedrooms with a budget of 100000 dollars
# Create a new Gurobi model
m = gp.Model("knapsack")
# Add decision variables for each item (binary variables indicating whether to include the item in the knapsack or not)
x = m.addVars(len(res_parcels), vtype=GRB.BINARY, name="x")
# Set objective function: maximize total value
m.setObjective(sum(x[i]*objective_weights[i] for i in range(len(res_parcels))), sense=GRB.MAXIMIZE)
# Add constraint: total weight of selected items must not exceed knapsack capacity (BUDGET)
# m.addConstr(sum(x[i]*constraint_weights[i] for i in range(len(res_parcels)))<= BUDGET)
# Optimize the model
m.optimize()
Gurobi Optimizer version 11.0.1 build v11.0.1rc0 (mac64[arm] - Darwin 22.1.0 22A400) CPU model: Apple M1 Thread count: 8 physical cores, 8 logical processors, using up to 8 threads Optimize a model with 0 rows, 1736 columns and 0 nonzeros Model fingerprint: 0xbec9e735 Variable types: 0 continuous, 1736 integer (1736 binary) Coefficient statistics: Matrix range [0e+00, 0e+00] Objective range [8e-01, 2e+01] Bounds range [1e+00, 1e+00] RHS range [0e+00, 0e+00] Found heuristic solution: objective 3441.5300000 Explored 0 nodes (0 simplex iterations) in 0.00 seconds (0.00 work units) Thread count was 1 (of 8 available processors) Solution count 1: 3441.53 Optimal solution found (tolerance 1.00e-04) Best objective 3.441530000000e+03, best bound 3.441530000000e+03, gap 0.0000%
! pip install gurobipy
Requirement already satisfied: gurobipy in /opt/anaconda3/lib/python3.11/site-packages (11.0.1)
# Now, add the model result to the GeoDataFrame
res_parcels["selected"] = [x[i].x > 0.5 for i in range(len(res_parcels))]
# Then Visualize the result
res_parcels.plot("selected", figsize=(15,5), legend=True, cmap="rainbow", edgecolor="black")
plt.axis("off")
plt.title("Selected Residential Parcels")
Text(0.5, 1.0, 'Selected Residential Parcels')
# Using folium library, create an interactive plot of showing the optimization result.
import folium
res_parcels = res_parcels.to_crs(4326)
# Create a Folium map centered around the mean latitude and longitude of the GeoDataFrame
map_center = [res_parcels.unary_union.centroid.y, res_parcels.unary_union.centroid.x]
m = folium.Map(location=map_center, zoom_start=15, control_scale=True)
# Define a style function to set colors based on the binary column
def style_function(feature):
color = 'purple' if feature['properties']['selected'] == 1 else 'pink'
return {'fillColor': color, 'color': color, 'weight': 1, 'fillOpacity': 0.6}
tooltip = folium.GeoJsonTooltip(
fields=["APN", "selected", "LandValue", "Bedrooms"]
)
# Add GeoDataFrame to the map with the defined style and tooltip functions
folium.GeoJson(
res_parcels,
style_function=style_function,
tooltip=tooltip
).add_to(m)
# Display the map
m
Chapter 2. Wildfires and Major Roads¶
Step 1. Download the Road Network Data in Santa Barbara¶
Note: OSMnx libarary provides the filtering functionality when downloading a graph.
However, this time we will not use it to practice data processing in Geopandas.
Hint: Santa Barbara Bounding Coordinates
- outhernmost Latitude: 34.2714° N
- Northernmost Latitude: 35.5295° N
- Westernmost Longitude: -120.6976° W
- Easternmost Longitude: -118.9600° W
north, south, east, west = 35.5295, 34.2714, -118.9600, -120.6976
# Fetch the road network
# It takes some time.. We have lots of roads in a county
graph = ox.graph_from_bbox(north, south, east, west,
network_type='drive')
# custom_filter='["highway"~"primary|trunk|motorway"]')
# Plot the road network
ox.plot_graph(graph)
# Convert it into GeoDataFrame
/var/folders/0f/r37lnqs969n4hggs14l43w280000gn/T/ipykernel_24414/689972490.py:5: FutureWarning: The `north`, `south`, `east`, and `west` parameters are deprecated and will be removed in the v2.0.0 release. Use the `bbox` parameter instead. See the OSMnx v2 migration guide: https://github.com/gboeing/osmnx/issues/1123 graph = ox.graph_from_bbox(north, south, east, west,
(<Figure size 800x800 with 1 Axes>, <Axes: >)
roads = ox.graph_to_gdfs(graph, nodes=False, edges=True)
roads.plot()
<Axes: >
# Check the unique values in "highway" column to select only Major Roads.
roads['highway'].value_counts()
highway residential 90449 tertiary 13276 secondary 7565 unclassified 4555 primary 3030 motorway_link 813 trunk 626 motorway 590 secondary_link 368 primary_link 157 [residential, unclassified] 146 living_street 136 [tertiary, residential] 113 tertiary_link 112 trunk_link 92 [trunk, motorway] 17 [tertiary, unclassified] 14 [secondary, tertiary] 11 road 8 [residential, living_street] 8 [secondary, unclassified] 4 [primary, motorway] 4 crossing 4 escape 3 [secondary, secondary_link] 3 [residential, motorway_link] 2 disused 2 [secondary, residential] 2 [secondary, residential, unclassified] 2 [tertiary, residential, unclassified] 2 [tertiary, tertiary_link] 2 [residential, tertiary_link] 1 [tertiary_link, residential] 1 [secondary_link, motorway_link] 1 [primary, motorway, motorway_link] 1 [trunk_link, primary_link] 1 [trunk_link, motorway_link] 1 [primary, motorway_link] 1 Name: count, dtype: int64
# Try to plot the road GeoDataFrame with a column "highway"
roads["highway_"] = roads["highway"].astype("str")
roads.plot("highway_", legend=True, figsize=(20,20))
plt.axis("off")
(-120.78512326500001, -118.872378035, 34.20852533, 35.59235327)
# Error? No worries. That's intended.
# Why do you think error happens? Hint: read the error message.
# How can we solve this issue? Let's stringify the column, "highway" and create a new column, "highway_"
# Then plot with the new column, "highway_"
# Now, let's create another column, "major", indicating whether each road segment is part of Major Roads or not.
# Let's consider the "primary" or "trunk" or "motorway" as Major Roads
roads["major"] = roads["highway_"].str.contains("primary|trunk|motorway")
# Then plot with the new column, "major"
roads.plot("major", legend=True, figsize=(20,20), cmap="Spectral_r")
plt.axis("off")
(-120.78512326500001, -118.872378035, 34.20852533, 35.59235327)
# Create a subset GeoDataFrame, "major_roads" by selecting the rows where major column value is True
major_roads = roads.loc[roads["major"]==True]
major_roads
| osmid | lanes | name | highway | maxspeed | oneway | reversed | length | geometry | ref | bridge | access | tunnel | junction | width | highway_ | major | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| u | v | key | |||||||||||||||||
| 95182337 | 2923989140 | 0 | 298912991 | NaN | NaN | motorway_link | NaN | True | False | 19.254 | LINESTRING (-119.19059 34.27417, -119.19040 34... | NaN | NaN | NaN | NaN | NaN | NaN | motorway_link | True |
| 95182344 | 95247838 | 0 | 10729877 | 2 | Santa Paula Freeway | motorway | 65 mph | True | False | 2908.449 | LINESTRING (-119.18669 34.27521, -119.18207 34... | CA 126 | NaN | NaN | NaN | NaN | NaN | motorway | True |
| 95187446 | 562638742 | 0 | 76561348 | 4 | East Thompson Boulevard | primary | 35 mph | False | False | 139.588 | LINESTRING (-119.29155 34.27827, -119.29146 34... | US 101 BUS | NaN | NaN | NaN | NaN | NaN | primary | True |
| 902353408 | 0 | 76561348 | 4 | East Thompson Boulevard | primary | 35 mph | False | True | 33.903 | LINESTRING (-119.29155 34.27827, -119.29164 34... | US 101 BUS | NaN | NaN | NaN | NaN | NaN | primary | True | |
| 902353746 | 0 | 186246480 | NaN | South Chestnut Street | primary | NaN | False | False | 27.744 | LINESTRING (-119.29155 34.27827, -119.29155 34... | NaN | NaN | NaN | NaN | NaN | NaN | primary | True | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 11746606869 | 370197560 | 0 | 1264315364 | NaN | NaN | motorway_link | NaN | False | False | 22.198 | LINESTRING (-120.65807 35.28946, -120.65810 35... | NaN | NaN | NaN | NaN | NaN | NaN | motorway_link | True |
| 11746606870 | 0 | 1264315365 | NaN | NaN | motorway_link | NaN | False | True | 57.224 | LINESTRING (-120.65807 35.28946, -120.65807 35... | NaN | NaN | NaN | NaN | NaN | NaN | motorway_link | True | |
| 11746606870 | 1969499984 | 0 | 186263631 | NaN | NaN | motorway_link | NaN | False | True | 15.034 | LINESTRING (-120.65807 35.28998, -120.65807 35... | NaN | NaN | NaN | NaN | NaN | NaN | motorway_link | True |
| 91429025 | 0 | 1031908894 | NaN | NaN | motorway_link | NaN | True | False | 84.743 | LINESTRING (-120.65807 35.28998, -120.65813 35... | NaN | NaN | NaN | NaN | NaN | NaN | motorway_link | True | |
| 11746606869 | 0 | 1264315365 | NaN | NaN | motorway_link | NaN | False | False | 57.224 | LINESTRING (-120.65807 35.28998, -120.65807 35... | NaN | NaN | NaN | NaN | NaN | NaN | motorway_link | True |
5336 rows × 17 columns
# Check the crs of major_roads
major_roads.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
# Interactive plot with the major roads. Do you agree that those are major roads?
import folium
# Create a Folium map centered around the mean latitude and longitude of the GeoDataFrame
map_center = [major_roads.unary_union.centroid.y, major_roads.unary_union.centroid.x]
m = folium.Map(location=map_center, zoom_start=7, control_scale=True)
# Define a style function to set colors based on the binary column
def style_function(feature):
color = 'red' if feature['properties']['selected'] == 1 else 'blue'
return {'fillColor': color, 'color': color, 'weight': 1, 'fillOpacity': 0.6}
tooltip = folium.GeoJsonTooltip(
fields=["highway_"]
)
# Add GeoDataFrame to the map with the defined style and tooltip functions
folium.GeoJson(
major_roads,
tooltip=tooltip,
name="Major Roads"
).add_to(m)
folium.LayerControl().add_to(m)
# Display the map
m
Step 2. Relate Ignition locations with Major Roads¶
# Load a shapefile, "Lab1Data/SBC_2007_2021_Wildland_Ignitions.shp" into a variable, "ignitions"
ignitions = gpd.read_file("Lab1Data/SBC_2007_2021_Wildland_Ignitions.shp")
major_roads = major_roads.to_crs(ignitions.crs)
# Plot the major_roads and ignitions together
# Create a new column, "mr_distance" in ignitions GeoDataFrame.
# The column contains the distance from each ignition point to the closest major_roads segment.
# Divide the distance values with 5280 (feet -> miles)
ignitions["mr_distance"] = ignitions.distance(major_roads.unary_union)
ignitions["mr_distance"] = ignitions["mr_distance"]/5280 # feet to miles
A = ignitions.plot("mr_distance", legend=True)
major_roads.plot(ax=A)
<Axes: >
# Why it doesn't work? What should you check?
# Yes. Coordinate System.
# Match the major_roads crs with the ignitions crs.
# Create a new column, "mr_distance" in ignitions GeoDataFrame.
# The column contains the distance from each ignition point to the closest major_roads segment.
# Divide the distance values with 5280 (feet -> miles)
# Plot the result!
# Now, let's draw a histogram showing the distribution of Distance.
# Create a histogram
plt.figure(figsize=(10, 6))
ignitions.mr_distance.hist(bins=40, color='skyblue', edgecolor='black', alpha=0.7)
# Add titles and labels
plt.title('Histogram of Ignition to Major Road Distance')
plt.xlabel('Distance to Major Road (miles)')
plt.ylabel('Frequency')
# Add grid
plt.grid(axis='y', linestyle='--', alpha=0.7)
# Add a background color
plt.gca().set_facecolor('#f5f5f5')
# Add a legend (optional)
plt.legend(['Distance from Ignition to Major Road'])
# Show plot
plt.show()
# It's hard to see the result because of an outlier in Channel Island.
# What will be a good way to drop it?
# Drop the furthest ignition point from Major roads, because it's in Channel Islands.
ignitions = ignitions.drop(ignitions.mr_distance.argmax())
# After dropping the outlier, create a histogram once again.
# Create a histogram
plt.figure(figsize=(10, 6))
ignitions.mr_distance.hist(bins=40, color='skyblue', edgecolor='black', alpha=0.7)
# Add titles and labels
plt.title('Histogram of Ignition to Major Road Distance')
plt.xlabel('Ignition to Major Road Distance (miles)')
plt.ylabel('Frequency')
# Add grid
plt.grid(axis='y', linestyle='--', alpha=0.7)
# Add a background color
plt.gca().set_facecolor('#f5f5f5')
# Add a legend (optional)
#plt.legend(['Distance from Ignition to Major Road'])
# Show plot
plt.show()
# This time, visualize the ignition points with the distance value, and also plot major_roads together.
A = ignitions.plot("mr_distance",legend=True, figsize=(10,10), cmap="RdYlGn_r", legend_kwds={'shrink': 0.3})
major_roads.plot(ax=A, color="darkgrey", lw=0.5)
plt.axis("off")
plt.title("Distance from Ignition to Major Road (miles)")
Text(0.5, 1.0, 'Distance from Ignition to Major Road (miles)')
Step 3. Area Burnt and Distance to Major Roads¶
Also, let's check out if the distance to Major Road is related with the area burnt by the fire.
What would be a good way to do so?
# Check out the columns contained in ignitions GeoDataFrame.
# Which column is having the area burnt information?
ignitions.columns
Index(['Id', 'Alarm_Date', 'Alarm_Time', 'Cause', 'Contain_Da', 'Contain_Ti',
'Incident_L', 'Incident_N', 'Acres', 'POINT_X', 'POINT_Y', 'DPA',
'FED_Cause', 'Incident_1', 'geometry', 'mr_distance'],
dtype='object')
# Draw scatterplot with the mr_distance and area burnt columns.
plt.scatter(ignitions["mr_distance"], ignitions["Acres"])
<matplotlib.collections.PathCollection at 0x28d894c10>
# Using a threshold, select a subset of ignitions to exclude outliers.
ignitions_wo_outliers = ignitions.loc[ignitions["mr_distance"] <8]
ignitions_wo_outliers = ignitions.loc[ignitions["Acres"] < 750]
# Draw scatterplot with the mr_distance and area burnt columns.
FIGSIZE = (10,5)
plt.figure(figsize=FIGSIZE)
plt.scatter(ignitions_wo_outliers["mr_distance"], ignitions_wo_outliers["Acres"])
plt.xlabel("Distance to Major Road")
plt.ylabel("Area Burnt in Acres")
plt.title("Scatter plot of Area Burnt in Comparison to Distance to Major Road")
# Add titles and labels.
Text(0.5, 1.0, 'Scatter plot of Area Burnt in Comparison to Distance to Major Road')
# Another way is to do IQR
# This time, let's use another way to drop the outliers. Use the IQR based approach.
# You can ask chatgpt for the function.
import numpy as np
def remove_outliers_iqr(data):
"""
Remove outliers from a dataset using the Interquartile Range (IQR) method.
Parameters:
- data: A list or NumPy array containing the dataset.
Returns:
- Cleaned dataset with outliers removed.
"""
# Convert data to NumPy array for easier manipulation
data = np.array(data)
# Calculate the first quartile (Q1)
Q1 = np.percentile(data, 25)
# Calculate the third quartile (Q3)
Q3 = np.percentile(data, 75)
# Calculate the interquartile range (IQR)
IQR = Q3 - Q1
# Define the lower bound and upper bound for outlier detection
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
# Filter out the outliers
cleaned_index = (data >= lower_bound) & (data <= upper_bound)
return cleaned_index
ignitions_wo_outliers = ignitions.loc[remove_outliers_iqr(ignitions["Acres"])]
# This time, let's use another way to drop the outliers. Use the IQR based approach.
# You can ask chatgpt for the function.
# Visualzie the scatter plot again. Don't forget to add titles and labels.
plt.figure(figsize=FIGSIZE)
plt.scatter(ignitions_wo_outliers["mr_distance"], ignitions_wo_outliers["Acres"])
plt.xlabel("Distance to Major Road")
plt.ylabel("Area Burnt in Acres")
Text(0, 0.5, 'Area Burnt in Acres')
# After dropping the outlier, create a histogram once again.
# Create a histogram
plt.figure(figsize=(10, 6))
ignitions_wo_outliers.mr_distance.hist(bins=40, color='skyblue', edgecolor='black', alpha=0.7)
# Add titles and labels
plt.title('Histogram of Ignition to Major Road Distance')
plt.xlabel('Ignition to Major Road Distance (miles)')
plt.ylabel('Frequency')
# Add grid
plt.grid(axis='y', linestyle='--', alpha=0.7)
# Add a background color
plt.gca().set_facecolor('#f5f5f5')
# Add a legend (optional)
#plt.legend(['Distance from Ignition to Major Road'])
# Show plot
plt.show()
Final Report ( Total 30 pts )¶
Chapter 1.¶
Submit your final map visualization for the "YearBuilt" column.
Ensure your map includes necessary elements such as title, legend, and scale bar.
Do not turn off the axis.
Utilize a suitable colormap. ( 3 pts )Load a new GeoDataFrame, "Lab1Data/Parcel_IV.shp".
Visualize its NetSecValue column with Scalebar.
You shouldn't turn off the axis for this map. ( 4 pts )Report the number of SB parcels selected in the given Knapsack Problem, aimed at maximizing the number of bedrooms within a budget of $1,000,000. ( 1 pt )
Report the number of bedrooms that can be obtained in the given Knapsack Problem, with the budget of $1,000,000. ( 1 pt )
Discuss the presence of null values in the "YearBuilt" column.
Hint: Utilize the Folium library to overlay parcels with null or not-null values on a basemap. ( 2 pts )In a knapsack problem, what happens if the objective function sets "MINIMIZE" rather than "MAXIMIZE"? ( 2 pt )
How many bathrooms does the parcel with zero bedrooms have? ( 1 pt )
In a knapsack problem, what happens if there's no constraint? Hint: Comment out the setConstraint row ( 2 pts )
Evaluate whether the current knapsack approach is appropriate or not, and provide reasoning for your conclusion. ( 2 pts )
Design a new Knapsack problem scenario, referring to the description.txt file for details on parcel data columns.
Explain the scenario you have created. ( 2 pts )Run the Knapsack Problem code with your custom scenario.
Visualize the results in an interactive plot using Folium.
Ensure that the color scheme is different from blue/red.
(Optional) You can try the different parcels in Isla Vista region (parcl_IV.shp) instead of the parcels used for instruction. ( 4 pts )
Chapter 2.¶
Submit a screenshot of your histogram and scatter plot without outliers.
Your plots must have title, x and y labels. (4 pts)What insight can you get out of the map, histogram and scatterplot you created in Chapter 2?
How can you utilize the insights for the urban wildfire risk management? ( 2 pts )
#Question 2: Load a new GeoDataFrame, "Lab1Data/Parcel_IV.shp".
parcels = gpd.read_file("Lab1Data/parcel_IV.shp")
# Plot parcel data
parcels.plot()
<Axes: >
#visualize NetSecValue with scalebar
ax = parcels.plot("NetSecVal", legend=True)
#add a scale bar
scalebar = ScaleBar (1, scale_formatter = lambda value, unit: f"{value}000 ft")
ax.add_artist(scalebar)
<matplotlib_scalebar.scalebar.ScaleBar at 0x2886cc850>